-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process Result tracking (IMPACT) #27
Comments
Reviewing the original change request filed by the IMPACT project, it seems as two changes are requested:
Example: <processingStep ID="ID005">
<processingDateTime>2010-12-15T15:02:48</processingDateTime>
<processingAgency>ACME Agency</processingAgency>
<processingStepDescription>manual correction</processingStepDescription>
<processingStepSettings>misc. settings</processingStepSettings>
<processingSoftware>
<softwareCreator>USAL</softwareCreator>
<softwareName>Aletheia</softwareName>
<softwareVersion>1.2.3</softwareVersion>
</processingSoftware>
</processingSteps>
<TextLine ID="ID069" STYLEREFS="ID007" BASELINE="1261" CORRECTEDBY="ID005" VPOS="1230" HPOS="260" HEIGHT="40" WIDTH="902"> Justification: "A lot of software tools and also human interactions are involved in different steps of the digitisation process. Each of them may affect an ALTO file by doing some refinements or corrections. From our point of view it would be desirable to keep track of the changes and verification done by the different agents which are involved in the digitisation process. This would allow a simple kind of a document history and gives also important information about the trustworthily of the whole document. If for example everything was verified by a service provider than we can asume that the quality of the document is very high. Storing the old values as well as the new ones would increase the filesize tremendously. Therefore we suggest to store only the information about what has been changed and by whom without keeping track of the changed values." |
A post-processing actopm like new layout analysis (like outlined in #36 ) will cause too big changes to be able to track in such method. Finally on the other side it is simple extension, will only be for optional usage and does not cause a structural issue. I would just shorten to also prevent data issue (CORR= / VERIFIED=). |
Continued in #39. |
Champion: Clemens Neudecker
Submitter: Impact
Submitted: 2013-02
Status: discussion
submitted - initial status when proposal is submitted
discussion - proposal is being discussed within the board
review - xsd code is being reviewed
accepted - proposal is accepted
rejected - proposal is rejected
draft - accepted proposal is in public commenting period
published - proposal is published in a schema version
Backwards compatible ??
To ALTO version ?
Purpose
A lot of software tools and also human interactions are involved in different steps of the digitisation process. Each of them may affect an ALTO file by doing some refinements or corrections. From our point of view it would be desirable to keep track of the changes and verification done by the different agents which are involved in the digitisation process. This would allow a simple kind of a document history and gives also important information about the trustworthily of the whole document. If for example everything was verified by a service provider than we can asume that the quality of the document is very high. Storing the old values as well as the new ones would increase the filesize tremendously.
Correction and Validation are possible outcomes of the same process.
Implementation
The ALTO schema already defines a element. The intention of this element is to record any details about those process steps that were carried out after the creation of the full text. The element is optional and not part of the actual page’s definition in ALTO.
In order to store information about the correction and verification process for individual text lines, words etc. the following elements are added to the section:
• stores the type of process step. It is a free text field, though IMPACT internal constraints require the element’s value to be set to “correction”.
• groups all elements regarding the result of the process. The element’s value attribute contains information about the outcome of the process. The element is repeatable. Each element represents a specific outcome of the process that is recorded in the element’s value attribute. This attribute may only contain two values: “corrected” or “verified”.
• is an element that wraps around all elements that were processed with the actual result as stated in the element’s value attribute.
• element contain the ID-value of an individual text line or word element. Unprocessed are not listed here.
If an element had not been processed, the element is not listed within .
Example:
Schema changes draft
Current schema Changed schema
The text was updated successfully, but these errors were encountered: