-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New problems with leading/trailing whitespace around linebreaks in text content #101
Comments
If everything in the text content ( This also shows that the cause of this issue are spaces caused by joining lines, which is behaviour we usually want to have: <t-str>foo</t-str>
<t-str>bar</t-str> The above should serialize as But... if we have an explicit linebreak: <t-str>foo</t-str>
<br/>
<t-str>bar</t-str> then this no longer makes sense and we want |
…IMPLICITSPACE property
I'm afraid we may have to add another chapter to our whitespace problems, this is the sequel to issue #88 ...
i have a paragraph with the following text:
This is produced by my latest additions to FoLiA-page (PageXML to FoLiA conversion,
pagexml-br
branch of foliautils).In addition, PageXML generates string annotations, which in turn relate back to the original PageXML:
The problem is, the offsets don't match up because of leading/trailing spaces. foliavalidator and folialint report the same:
The full text the library sees, and which is produced by both folia2txt and FoLiA-2text. I marked leading/trailing whitespace with an underscore for visibility:
Note the initial whitespace for all but the first line. So where I'd expect
S\nJ
we getS\s\n\sJ
instead. I think this is unexpected behaviour and qualifies as a bug we'd want to fix. The offsets as reported in the FoLiA-page output seem correct to me.The text was updated successfully, but these errors were encountered: