Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change markup of fs in annotation document #5

Open
dasch124 opened this issue Dec 16, 2022 · 3 comments
Open

change markup of fs in annotation document #5

dasch124 opened this issue Dec 16, 2022 · 3 comments
Assignees
Labels

Comments

@dasch124
Copy link
Member

dasch124 commented Dec 16, 2022

Currently an annotation in the annotation document looks like this:

<fs xmlns="http://www.tei-c.org/ns/1.0" xml:id="anid_2308">
   <f name="trans">
      <string>abad</string>
   </f>
   <f name="pos">
      <string></string>
   </f>
   <f name="gloss">
      <string></string>
   </f>
   <f name="msd">
      <string></string>
   </f>
   <f name="root">
      <string></string>
   </f>
   <f name="dict">
      <string>abad_000</string>
   </f>
</fs>

If possible, I'd like to make this a little more expressive to encode what is a string and what is a reference:

<fs xmlns="http://www.tei-c.org/ns/1.0" xml:id="anid_2308">
   <f name="trans">
      <string>abad</string>
   </f>
   <f name="pos" fVal="flib:{id}"/>
   <f name="gloss">
      <string></string>
   </f>
   <f name="msd" fVal="flib:{id}"/>
   <f name="root">
      <string></string>
   </f>
   <f name="dict" fVal="dict:abad_000"/>
</fs>

the pos-feature would point to "pos.v" whereas msd points to the full morphosyntactic tag "v_past_sg_2"

@dasch124
Copy link
Member Author

@charlymo do you think this would be breaking functionality within enricher ?

@dasch124
Copy link
Member Author

CAVEAT: Changes to annotation document must also be reflected in https://github.com/acdh-oeaw/shawi-data/blob/main/082_scripts_xsl/copyAnaToVert.xsl#L42

@dasch124
Copy link
Member Author

After meeting 2023-01-30:
Words in Turkish will be annotated as well, however not linked to a dictionary entry. During validation, we need to differentiate between those and erroneously missing dictionary references. Introducing a new feature “language tag” in the annotation document for tokens representing foreign lexical items:

<f name="lang" fVal="iso6393:tur"/>

(with the iso6393 prefix standing for https://vocabs.acdh.oeaw.ac.at/iso6393/{id})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants