-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReadingOrder broken for recursive groups #15
Comments
The last XML snippet seems to be the correct way to do it. |
Hi, I made a change, please test with the latest release. |
I can confirm that with 1.4.06 everything works, including (a differently colored) arrow path within recursive groups. Thank you very much!
Okay, then maybe the documentation in the XSD should be amended to reflect that. Also, whether gaps in @kba I wonder how we can get the our generateDS object model to serialise the different member lists ( |
Thanks for the doc changes (the diff on GitHub was a bit messy and hard to read, but I trust you). |
That's because I did a second commit where I reversed the order in which the two sets of group types get introduced – from indexed variants first, unindexed second to unindexed first, indexed second (because they are easier to understand and referenced on top-level ReadingOrder). You should be fine looking at the commit diffs individually, the first commit in particular.
Well that's no small issue. In that case, contiguity is not required. Therefore, I should make another patch, replacing my statement about this by its inverse.
Of course, we cannot enforce these details syntactically. But it is important how PRImA libs handle/expect this, being the reference implementation. BTW we just have the unfortunate situation that the less strict interpretation of coordinate consistency in Aletheia causes our OCR-D validators to trigger tons of "false-positive" alarms about little polygon defects on annotations exported from Aletheia. (For example, small 1-pixel child-parent extrusions, rounding-related path self-intersections etc.) As soon as we can have a systematic account what sorts of errors can be automatically fixed, we will (make a post-processor and) report back. |
Also, I wonder why the schema is so strict about the lower bounds of group size. That is, why is the subsequence (It would make implementations much easier if groups could just be empty as well. Otherwise they have to be converted between groups and simple RegionRefs depending on what content some RO detector returned.) @chris1010010 Give me a thumbs up if you would accept a PR making empty groups tolerable. |
@bertsky RegionRef for groups was added later, so empty groups did not make sense before that. But even now, I think we should keep that restriction. I'll ask the others, but I can already guess the response ;-) |
I see. Another question keeps swirling around in my head when I try to implement this in OCR-D Python: Are region IDs allowed to be referenced by multiple I am not asking whether there are already use-cases for this. (One can imagine people may want to have different "logics", like one purely page oriented including all footnotes and marginals, and another text-flow oriented only regarding the linear text body from page to page.) I just have to know whether this is allowed/intended to happen, and how PRImA's implementation handles this. (My current attempts always derive one global region ID dictionary from the recursive structure, which would of course fail if the answer was yes.) |
It's meant to be a reading order tree, therefore a region should only be referenced once. |
Understood. That makes it easier to implement. (See PRImA-Research-Lab/PAGE-XML#22)
Both actually. And from my perspective the two tasks are not so different, because OCR-D processors always annotate incrementally, i.e. they have to parse and re-generate.
Ok, that's good to know. So the parser and viewer are tolerant, but the generator and editor are strict. |
I struggle getting PageViewer to correctly visualise a correct but recursive reading order.
Example:
Let's start off with a page without region recursion:
PAGE-XML snippet
ReadingOrder
contains a singleOrderedGroup
with a flat list ofRegionRefIndexed
, including tables without further structure. PageViewer correctly displays the order.Now let's add some
TextRegion
cells to the tables, and properly include them inReadingOrder
: eachRegionRefIndexed
becomes anOrderedGroupIndexed
of the same@regionRef
and@index
:PAGE-XML snippet
As you can see, PageViewer ignores the recursive group entries (for the tables) in the arrow path, but seems to add them at the end with position 0,0.
So, maybe PageViewer does not like the XML ordering by type, but wants it sorted by
@index
?PAGE-XML snippet
It seems that the order is better that way, but the position of the recursive group entries is still at 0,0.
How am I supposed to represent this?
Full file and image:
Gutachten2-2.zip
The text was updated successfully, but these errors were encountered: