Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change BASELINE to accommodate a list of points in addition to a single point #32

Closed
jpmoreux opened this issue Apr 7, 2015 · 24 comments

Comments

@jpmoreux
Copy link
Member

jpmoreux commented Apr 7, 2015

Günter Mühlberger and Structify colleagues at University of Innsbruck would like to request using a list of points for the BASELINE instead of one single point. So changing from

< xsd:attribute name="BASELINE" type="xsd:float" use="optional"/>
to
< xsd:attribute name="BASELINE" type="PointsType" use="optional"/>

Moreover, for handwritten text it could be useful to have more than one BASELINE for a single text line, e.g when a text was crossed and overwritten.

The first marked text below shows a line with logically two base lines. The word above the line belongs logically to the same line. So this is the reason why we would like to have several base lines for one line.

The marked text number 2 shows why the baseline realised as polyline is such important when dealing with handwritten or distorted text.

altobaselinerequest

@cowboyMontana cowboyMontana changed the title Baseline Change BASELINE to accomodate a list of points in addition to a single point Apr 7, 2015
@artunit
Copy link
Member

artunit commented Jul 20, 2017

Could BASELINE use SHAPE on the same way as STRING, etc,. with the addition of POLYLINE to ShapeType?

@artunit artunit self-assigned this Jul 21, 2017
@artunit artunit changed the title Change BASELINE to accomodate a list of points in addition to a single point Change BASELINE to accommodate a list of points in addition to a single point Jul 21, 2017
@artunit
Copy link
Member

artunit commented Oct 18, 2017

I realised that my comment sort of skipped a major implication, that BASELINE becomes an element instead of an attribute. As I tried working though this more by hand, I see the relationship between TEXTLINE, BASELINE and STRING much better. If we have something like:

<xsd:element name="BASELINE" type="PointsType" minOccurs="0"/>

That would allow multiple BASELINES and capture the geometry of the line(s). If someone was calculating typographic/writing constructs like "descenders", it might be important to capture the thickness of the baseline, that's where the SHAPE question came in, but that's probably not necessary in lieu of an actual request.

@artunit
Copy link
Member

artunit commented Nov 28, 2017

Just a quick update on this, I think we should go with the original request for now, i.e., use:

<xsd:attribute name="BASELINE" type="PointsType" use="optional"/>

It is true that there are many occasions where handwritten materials have multiple lines, but it gets really dicey when the second line stops half-way through a character and I am wondering if it's really a BASELINE at that point. I suspect that the "descender" aspect might come forward at some point but maybe we can avoid the weeds on some of this by starting with the smallest change and working from there.

@Jo-CCS
Copy link
Member

Jo-CCS commented Jan 18, 2018

I disbelieve that the first sample marked with "Ex. 1" is a good sample as I wonder if this should not be described as separate "TextLine" object", but anyway as outlined on the second example I it is well seen that the line is not excact horizontal and it need to be possible to describe even curved or sloped lines.

Further more I would like to outline that an annotation is completely missing for this attribute and need to outline the exact intension of usage / value.
From the other sample of Jean-Philip uploaded here it was filled with the value of the distance to the top of the page. I would expect that for the PointType it should be kept like this to have the absolute coordinates to the page top/left corner as the other coordinates and no relative values to the TextLine object.

@artunit
Copy link
Member

artunit commented Feb 11, 2018

Sorry to be so slow on this, Jo, I didn’t get pinged by email when your comment was added and missed this until now. In typesetting, the baseline is the line upon which a line of text rests. The concept carries over to handwriting but it can blur on inspection with things like underlining for emphasis (which is what I think is happening in Ex. 2, where the baseline intersects with the underline). I think it's technically not always possible to determine what is baseline in handwriting but the coordinates can define a line that seems to fulfill this function. To make this explicit:

<xsd:attribute name="BASELINE" type="PointsType" use="optional">
<xsd:annotation>
<xsd:documentation>A single line on which a line of text rests.</xsd:documentation>
</xsd:annotation>
</xsd:attribute>

@artunit
Copy link
Member

artunit commented Mar 20, 2018

As per the Mar. 12, 2018 meeting, we use the typographic interpretation of BASELINE and define the coordinate orientation:

<xsd:attribute name="BASELINE" type="PointsType" use="optional">
   <xsd:annotation>
      <xsd:documentation>
         Pixel coordinates based on the left-hand top corner of an image 
         which define a single line on which a line of text rests.
      </xsd:documentation>
   </xsd:annotation>
</xsd:attribute>

@urieli
Copy link

urieli commented May 23, 2018

It's not clear to me which version of the schema is being discussed here.
In version 4.0, we have only: https://github.com/altoxml/schema/blob/master/v4/alto-4-0.xsd#L906

However, there is no documentation to indicate what this float value represents: is it the vertical coordinate of the baseline at the TextLine's HPOS? Presumably, the containing TextBlock's ROTATION then makes it possible to deduce a line spanning the entire TextLine.

Changing this to a PointsType should make it possible to define a line explicitly (although PointsType documentation doesn't tell us how to encode the points as a string).

Note: I can easily imagine a baseline with more than two points, for book pages that were not scanned "flat", so that the text curves upwards near the inner margin.

@artunit
Copy link
Member

artunit commented Jun 7, 2018

The syntax from the comment on Mar. 20 is the latest attempt to address this issue and is not part of a schema yet. The syntax has to be agreed to by the ALTO Board before becoming part of a version. and is still open to discussion. We are moving towards using PointsType, which is used for several SHAPES now. The syntax I have seen is the x coordinate followed by the y coordinate, e.g. 200 400 203 405 210 420, where something like [[200, 400], [203, 405], [210, 420]] or maybe (200,400),(203,405),(210,420) would be more explicit. This might be worth pursuing as a separate issue.

@artunit artunit closed this as completed Jun 7, 2018
@artunit artunit reopened this Jun 7, 2018
@artunit
Copy link
Member

artunit commented Jun 7, 2018

Gah, I didn't mean to close this, reopened and I need more coffee!

@mittagessen
Copy link
Contributor

mittagessen commented Feb 18, 2019

Is there a way to get this on the pipeline for inclusion in the next ALTO version?

@artunit
Copy link
Member

artunit commented Feb 18, 2019

@mittagessen - I think so, it was generally agreed to at the Board level.

@artunit
Copy link
Member

artunit commented Sep 28, 2019

As per the 2019-09-27 Board Meeting, the proposed attribute will be changed to use a polyline. This also means the issue is available for voting ACCEPT or REJECT.

<xsd:attribute name="BASELINE" type="PointsType" use="optional">
   <xsd:annotation>
      <xsd:documentation>
         Pixel coordinates based on the left-hand top corner of an image 
         which define a polyline on which a line of text rests.
      </xsd:documentation>
   </xsd:annotation>
</xsd:attribute>

@artunit
Copy link
Member

artunit commented Sep 28, 2019

ACCEPT

@artunit artunit added 6 voting and removed 3 review labels Sep 28, 2019
@cowboyMontana
Copy link
Member

ACCEPT

3 similar comments
@splet
Copy link

splet commented Dec 13, 2019

ACCEPT

@cneud
Copy link
Member

cneud commented Dec 16, 2019

ACCEPT

@ntra00
Copy link
Member

ntra00 commented Dec 16, 2019

ACCEPT

@jukervin
Copy link
Member

ACCEPT

4 similar comments
@cipriandinu
Copy link
Member

ACCEPT

@hanyelsawy
Copy link
Member

ACCEPT

@Ra1phM
Copy link
Member

Ra1phM commented Dec 17, 2019

ACCEPT

@jpmoreux
Copy link
Member Author

ACCEPT

@artunit
Copy link
Member

artunit commented Feb 9, 2020

Slated for 4.2, will close when published.

@artunit
Copy link
Member

artunit commented Sep 2, 2020

Added in v4.2, released August 2020.

@artunit artunit closed this as completed Sep 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests