[Text_alignment / OCR] Syllables not being picked up for MS73 #1215

JoyfulGen · 2024-10-10T08:45:34Z

UPDATE: This might be in part due to a user mistake (here we are again), so please hold!

I've been starting to run some e2e OMR workflows with MS73 folios and the text part of the process doesn't seem to be working. Normally, the original image is separated into layers, and the text layer gets sent to the Text_Alignment job, which uses OCR to roughly find the syllables and then match them with the correct text that we provide. In Neon, the syllables will look like this (this is a Salzinnes folio):

However, this process doesn't seem to be working for MS73. So far, @kyrieb-ekat got this (enjoy the numbers):

And I got no syllables at all:

Because the syllable text is directly related to how the neumes are grouped into syllables, these errors result in the syllable groupings being completely wrong, which lengthens the correction time quite a bit.

I ran an e2e OMR workflow with an Einsie folio and the syllables were perfect, so this seems to be an MS73-specific problem. Could it simply be that the training model we've been using for Salzinnes and Einsie doesn't work for MS73? In the Text_Alignment job, the training model is built directly into the job, so I don't think this is something that I can change.

The text was updated successfully, but these errors were encountered:

JoyfulGen · 2024-10-10T11:38:03Z

ANOTHER UPDATE: This was indeed in part due to user error (you can always count on me). I mistakenly assigned the wrong layer output to the input of the Text_Alignment job, which is why my syllables came up completely empty.

However! Kyrie did not make that mistake, so her result is accurate. I tried running a couple more workflows after fixing my mistake and I'm getting something similar. There are syllables, but they are far too few and those that are there are not correct. I'm not sure at the moment what this is due to; it's possible that as our glyph classification training data improves, the syllable problem will lessen. I'll put this issue on hold for now until we know more.

kyrieb-ekat · 2024-10-10T18:22:03Z

I'm going to also be retracing some of the previous steps done on this, and test a few more pages of MS73. Also, to look into OCRopus and see what the text_alignment thought processes for the OCR models were.

JoyfulGen added Priority: HIGH Text Alignment Issues pertaining to text alignment Rodan job bug and removed Priority: HIGH labels Oct 10, 2024

JoyfulGen added the ON HOLD label Oct 10, 2024

kyrieb-ekat mentioned this issue Oct 11, 2024

How do we add new OCR models, and what do we use to train them? #1217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Text_alignment / OCR] Syllables not being picked up for MS73 #1215

[Text_alignment / OCR] Syllables not being picked up for MS73 #1215

JoyfulGen commented Oct 10, 2024 •

edited

Loading

JoyfulGen commented Oct 10, 2024

kyrieb-ekat commented Oct 10, 2024

[Text_alignment / OCR] Syllables not being picked up for MS73 #1215

[Text_alignment / OCR] Syllables not being picked up for MS73 #1215

Comments

JoyfulGen commented Oct 10, 2024 • edited Loading

JoyfulGen commented Oct 10, 2024

kyrieb-ekat commented Oct 10, 2024

JoyfulGen commented Oct 10, 2024 •

edited

Loading