-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: Text consistency breakage since FoLiA v2.4.1 #92
Comments
From the second example: <t>op de wyse: <t-style class="i">Pharaos VVimpelen ontdaan, en sietmen niet meer svvieren. </t-style>Oft, <t-style class="i">Nu singt, nu springt, &c.</t-style></t>
|
In the first example, a space is removed because of #88, but the enrichment doesn't reflect that because it was done with the older version, so a In the second example, the element includes a trailing space, which is removed because of #88. In this case, it leads to a worse result because we do want that space (but it should have been as a trailing space after |
A third example: <t>NIemand, die zig maar eenigzints met de Wereld, ik wil zeggen, met Wereldsche noodzakelyke Bezigheeden, Bedryven en Hanteeringen, ophoud, waaronder 100, ja 1000derley nuttige en nootwendige dingen begrepen zyn, hier ondoenlyk en onnodig alle op te noemen, is onbekent, hoe nuttig en dienstig de zo genaamde <t-style class="i">Couranten </t-style>of<t-style class="i"> Nieuws-Papieren </t-style>zyn, namelyk, om dezelve te lezen: Dat dus ook niemand ontkennen zal, die deze zake maar eenigzints met een Oog van dieper inzigt en met een groter overleg aanmerkt als het Gemeen; waar over we nu in 't begin van onze Redenering niet zullen uitweiden, alleen voor eerst maar zeggende, dat 't om de Nuttigheid der Couranten is, waarom het in alle Landschappen van Europa, en zelfs verder, geoorlooft is, en van de Hoge Regenten gepriviligeert word, om Couranten uit te geven, die vervolgens elk kopen en lezen kan, om te hooren wat 'er in de Weereld omgaat, 't welk we in 't vervolg staan nader op te helderen.</t> Here we would obtain "CourantenofNieuws-Papierenzyn" due to the odd formatting that's no longer compatible with #88. |
I'm proposing the following solution: If there is a text consistency error, check whether the error would also occur in the old situation prior to #88 where leading/trailing whitespace was not stripped. If things are consistent according to the old rules, then the consistency error becomes a warning rather than a hard error. This way we preserve backward compatibility while still explicitly communicating that the newer situation is preferable. Ideally, at some point we implement an algorithm to automatically fix this issue, in |
This is what we get now with the solution in place, it's verbose but it gets the message across I think:
|
…(not done yet, have to address the logical in parseXml still)
This has now been implemented in both foliapy (released) and libfolia (not yet released) |
…ario (proycon/folia#88), implemented extra backward compatibility for offsets (proycon/folia#92)
We introduced a problem with #88 that breaks backward compatibility. In Nederlab we have a file with (note all the spacing and newlines):
This passed validation for FoLiA <v2.4.1, but now with FoLiA >=v2.4.1 it fails with a text consistency problem:
Foliapy and libfolia are consistent in reporting the error. I'm still investigating why it goes wrong exactly. Obviously, we fixed #88 for a good reason, but we also don't want to invalidate older FoLiA files that were valid before. Ideally, we find a solution that maintains the solution in #88 and also validates against these older files. Otherwise we will need a patchy solution so the newer validator doesn't invalidate the older FoLiA.
To be continued...
The text was updated successfully, but these errors were encountered: