Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix slub/dfgviewer#147 in ALTO parser #455

Open
sebastian-meyer opened this issue Jan 13, 2020 · 2 comments
Open

Fix slub/dfgviewer#147 in ALTO parser #455

sebastian-meyer opened this issue Jan 13, 2020 · 2 comments
Labels
⚙ feature A new feature or enhancement.

Comments

@sebastian-meyer
Copy link
Member

We have fixed slub/dfg-viewer#147 rather quick & dirty. A better solution would involve fixing the issue directly in the ALTO parser of Kitodo.Presentation.

@sebastian-meyer sebastian-meyer added the ⚙ feature A new feature or enhancement. label Jan 13, 2020
@bertsky
Copy link

bertsky commented Jun 24, 2021

Your fix now allows to render text that has (HTML-encoded) newlines in it as well, but no SP (or not even multiple distinct TextLine elements). See here for an example. (This ALTO was produced by page-to-alto converter with --alto-version 2.0 --dummy-textline --dummy-word in effect.)

It would be great if that workaround would still work in the future (because full texts without true/correct textline and word segmentation are a valid use case).

But it also shows that it is important for readibility that at least some newlines appear / get rendered. In my example, newlines are already included in the string. But Presentation should also insert them between successive TextLines.

@bertsky
Copy link

bertsky commented Feb 17, 2023

BTW, the ALTO download then removes the HTML-encoded newline characters – too bad!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚙ feature A new feature or enhancement.
Projects
None yet
Development

No branches or pull requests

2 participants