-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tagging mechanism to aid processors #93
Comments
Small addition: I think we should encourage processors to clean up the tags they 'consume', leaving the resulting FoLiA as clean as possible. |
proycon
added a commit
that referenced
this issue
Apr 2, 2021
proycon
added a commit
to proycon/foliapy
that referenced
this issue
Apr 2, 2021
proycon
added a commit
to proycon/foliapy
that referenced
this issue
Apr 2, 2021
proycon
added a commit
that referenced
this issue
Apr 2, 2021
proycon
added a commit
to proycon/foliapy
that referenced
this issue
Apr 2, 2021
…gular 'tag' for the XML attribute (proycon/folia#93)
proycon
added a commit
that referenced
this issue
Apr 2, 2021
proycon
added a commit
to proycon/foliapy
that referenced
this issue
Apr 2, 2021
proycon
added a commit
to LanguageMachines/libfolia
that referenced
this issue
Apr 2, 2021
proycon
added a commit
to proycon/foliapy
that referenced
this issue
Apr 6, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I propose we introduce a generic
tag
attribute that allows people to tag any FoLiA element, the value being a space-delimited list of some undetermined vocabulary that is tool-specific. These tags can be used by FoLiA tools to help their processing. We're essentially encoding some extra 'cue' in the FoLiA to help another tool do its job, and such a cue may be needed because the information is not present in the FoLiA yet, or is too complexly encoded for the other tool to unravel.A use case emerged from #88 where we need cues in untokenised FoLiA text to help the tokeniser determine where to force a token boundary:
We can also imagine a tool A that 'tags' specific elements given some complex search criteria, and a tool B that then operates on all elements that are tagged with a particular tag. Tags would here serve a function to help keep the two tools separated and specialised (unix philosophy).
The tags carry no intrinsic meaning for the FoLiA representation whatsoever (we have
class
for that already), they are merely signals to further tools in the processing chain.We could then use a value like token or separate for the tokenisation cues:
The text was updated successfully, but these errors were encountered: