Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hxltmcli .asa.hxltm.json / .asa.hxltm.yml: HXLTM Abstractum Syntaxim Arborem #3

Open
fititnt opened this issue Jul 15, 2021 · 0 comments

Comments

@fititnt
Copy link
Member

fititnt commented Jul 15, 2021

# @ARCHIVUM       ontologia/cor.hxltm.yml
# @DESCRIPTIONEM  HXL Trānslātiōnem Memoriam (HXLTM)
# @LICENTIAM      Dominium publicum
formatum:
  # (...)

  HXLTM-ASA:
    __meta:
      archivum_extensionem: 
        - .asa.hxltm.json
        - .asa.hxltm.yml
      normam:
        - https://hdp.etica.ai/hxltm/archivum/#HXLTM-ASA
      descriptionem: |
        _[eng-Latn]
        The HXLTM-ASA is an not strictly documented Abstract Syntax Tree
        of an data conversion operation.

        This format, different from the HXLTM permanent storage, is not
        meant to be used by end users. And, in fact, either JSON (or other
        formats, like YAML) are more a tool for users debugging the initial
        reference implementation hxltmcli OR developers using JSON
        as more advanced input than the end user permanent storage.

        Warning: The HXLTM-ASA is not meant to be an stricly documented format
        even if HXLTM eventually get used by large public. If necessary,
        some special format could be created, but this would require feedback
        from community or some work already done by implementers.
        [eng-Latn]_

        Trivia:
          - abstractum, https://en.wiktionary.org/wiki/abstractus#Latin
          - syntaxim, https://en.wiktionary.org/wiki/syntaxis#Latin
          - arborem, https://en.wiktionary.org/wiki/arbor#Latin
          - conceptum de Abstractum Syntaxim Arborem
            - https://www.wikidata.org/wiki/Q127380
      nomen:
        eng-Latn: 'HXLTM Abstractum Syntaxim Arborem'
      situs_interretialis:
        referens_officinale:
          - https://hdp.etica.ai/hxltm
          - https://github.com/EticaAI/HXL-Data-Science-file-formats/issues/223
          - https://github.com/EticaAI/HXL-Data-Science-file-formats/labels/HXLTM

The idea of create a format to use HXL to store both translation memories (not just the XLIFF format) but also glossaries but in special terminology is hardcore. Not so from the code implementation, but from the point of the issue it tries to abstract is complex.

Even if mostly for internal usage (e.g. not strictly documented for external use) instead of we 'convert' HXLated data (aka CSVs) to other formats (in special the XML ones) we're already drafting what could be called an Abstrac Syntax Tree (https://en.wikipedia.org/wiki/Abstract_syntax_tree). It can be a simpler one, but at least we're not passing to converters raw CSV pointers.

Comparison to others linguistic Abstract Syntax

See also:

Turns out that do exist some long time ideas about abstract linguistic content, but what could be called 'HXLTM ASA' is more at container level (as it could be useful to convert from file types) than at term level (as it would be to undestand what a term is to use for translate concepts).

So even if HXLTM ASA becomes usable for external tools, we will not even try to do too much micro management. BUT one thing we could do here is intentionally let it easy for others to convert for whatever format they want and we do not try to be strict on what HXLTM ASA is, so if someone else would want to inject even more details at term level, they could.

On Grammatical Framework

The Grammatical Framework (that is cited a lot on the Abstract Syntax as Interlingua) seems to be the state of the ar of how to generate a way to understand sentences in different natural languages. I, Rocha, do not plan to go deep on this, since the sort to medium term interest is more about how to store terminology and translations memories, and if the minimal implementation to support TBX export already can take time, the best I could do is make easier to (if do exist interest year later) people use HXLTM dialects to store linguistic data while still have decent portability between other data formats.

fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Oct 19, 2021
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Oct 19, 2021
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Oct 20, 2021
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Oct 20, 2021
@fititnt fititnt transferred this issue from EticaAI/HXL-Data-Science-file-formats Nov 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant