Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagging mechanism to aid processors #93

Closed
proycon opened this issue Mar 30, 2021 · 1 comment
Closed

Tagging mechanism to aid processors #93

proycon opened this issue Mar 30, 2021 · 1 comment
Assignees
Labels
enhancement ready Implemented but not released yet
Milestone

Comments

@proycon
Copy link
Owner

proycon commented Mar 30, 2021

I propose we introduce a generic tag attribute that allows people to tag any FoLiA element, the value being a space-delimited list of some undetermined vocabulary that is tool-specific. These tags can be used by FoLiA tools to help their processing. We're essentially encoding some extra 'cue' in the FoLiA to help another tool do its job, and such a cue may be needed because the information is not present in the FoLiA yet, or is too complexly encoded for the other tool to unravel.

A use case emerged from #88 where we need cues in untokenised FoLiA text to help the tokeniser determine where to force a token boundary:

<t>
  <t-str>item1<t-style tag="token"><feat class="superscript" subset="font_typeface"/>2</t-style></t-str><t-str>something</t-str>
</t>

We can also imagine a tool A that 'tags' specific elements given some complex search criteria, and a tool B that then operates on all elements that are tagged with a particular tag. Tags would here serve a function to help keep the two tools separated and specialised (unix philosophy).

The tags carry no intrinsic meaning for the FoLiA representation whatsoever (we have class for that already), they are merely signals to further tools in the processing chain.

We could then use a value like token or separate for the tokenisation cues:

@proycon proycon self-assigned this Mar 30, 2021
@proycon proycon added this to the v2.5.0 milestone Mar 30, 2021
@proycon
Copy link
Owner Author

proycon commented Mar 30, 2021

Small addition: I think we should encourage processors to clean up the tags they 'consume', leaving the resulting FoLiA as clean as possible.

proycon added a commit to proycon/foliapy that referenced this issue Apr 2, 2021
proycon added a commit to proycon/foliapy that referenced this issue Apr 2, 2021
proycon added a commit to proycon/foliapy that referenced this issue Apr 2, 2021
proycon added a commit that referenced this issue Apr 2, 2021
proycon added a commit to proycon/foliapy that referenced this issue Apr 2, 2021
proycon added a commit to LanguageMachines/libfolia that referenced this issue Apr 2, 2021
@proycon proycon added ready Implemented but not released yet and removed in progress labels Apr 6, 2021
proycon added a commit to proycon/foliapy that referenced this issue Apr 6, 2021
@proycon proycon closed this as completed Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ready Implemented but not released yet
Projects
None yet
Development

No branches or pull requests

1 participant