-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
text inside <t-hbr> nodes is allowed, but problematic #25
Comments
I think the idea was that could contain the actual hyphenation symbol that was used in the text, reflecting that it is an actual symbol appearing in the text. You'd need it again if you'd want to serialize the text exactly as it was. If we'd forbid it entirely as you suggest (which makes things simpler indeed), we'd need to assume there is one generic hyphenation symbol (e.g. a hyphen) that we can use in case the user wants the 'original' text exactly. By default, foliapy indeed resolves the I'm also a bit hesistant to simplify this now after the fact as it'd break backward compatibility (I don't whether there a are a lot of documents out there using this, probably not, but can't be sure). |
Which was exactly what I tried for FoLiA-abby, leading to this discussion.
Where `resolves' implies, IGNORING completely. |
Looks good yes! |
In hindsight, I think it is much better to NOT ABUSE the class feature for storing the actual hyphen. That an evil user could use that value to store more than just a hyphen is not really a problem imho As a bonus, we could allow text inside a |
I agree with your assessment. What is in the text should be text, so putting them in the xml text was the good decision, as often these hyphens appear verbatim in the text. Even though by default we may want to ignore this text.
Agreed
That makes sense yes. It does raise some difficulties due to the way XML deals with whitespace if the element were to be split over multiple lines. |
Ok, I will stick to that solution then.
Well, I was referring to a I will implement this and create some simple examples for testing |
I created a simple example to demonstrate what libfolia is capable of now: <?xml version="1.0" encoding="UTF-8"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="markup" generator="libfolia-v2.14" version="2.5.1">
<metadata type="native">
<annotations>
<sentence-annotation set="FoLiA-txt-set"/>
<paragraph-annotation set="FoLiA-txt-set"/>
<linebreak-annotation set="FoLiA-txt-set"/>
<text-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl"/>
<hyphenation-annotation set="FoLiA-txt-set"/>
<hspace-annotation set="FoLiA-txt-set"/>
</annotations>
</metadata>
<text xml:id="markup.text">
<p xml:id="markup.p.1">
<s xml:id="markup.p.1.s.1">
<t>Dit is een test. Met enkele afgebroken zin<t-hbr>-</t-hbr><br space="no"/>nen. Dit is de eerste. Met ook een soft af<t-hbr>¬</t-hbr><br space="no"/>breking.<br/></t>
</s>
<s xml:id="markup.p.1.s.2">
<t>En dit heeft een lange'<t-hspace> </t-hspace>'space val.<br/></t>
</s>
<s xml:id="markup.p.1.s.3">
<t>En dit heeft een enkele'<t-hspace/>'space val.<br/></t>
</s>
<s xml:id="markup.p.1.s.4">
<t>En dit heeft een speciale'<t-hspace>⍽</t-hspace>'space val.<br/></t>
</s>
</p>
</text>
</FoLiA> Both folialint and foliavalidator accept this file. OK
NOTE @proycon : folia2text slides in some leading spaces, after the first sentence. A minor bug? BUT FoLiA-2text can now also produce, (using
Which is very cool, I think. Second NOTE: This example is handcrafted, at the moment there is no DIRECT way to create a FoLiA file with "special" |
@pirolen Please note: We reverted back to the solution before January 17th. sorry for the inconvenience. |
Given this example:
folia2txt produces:
Appeltaart
Which is maybe consistent with the docs , although it is not explicitly forbidden
(
should
isn'tmay not
)FoLiA-2text gives:
Appelperentaart
, so does interpret the embedded text.My problem with this is, dat NOT including the text violates the principle of least surprise. By just looking in the text of the
<p>
you might expect theperen
to show up.The Best solution imho is to explicitly forbid a text content inside a
<t-hbr>
an give an error when it is attempted.The text was updated successfully, but these errors were encountered: