TSV to STAM conversion #1

proycon · 2023-03-25T21:08:19Z

Implement TSV (possibly also CSV but let's keep it simple) to STAM conversion.

proycon · 2023-03-26T13:30:11Z

Make the columns configurable, also for STAM to TSV conversion.

proycon · 2023-05-25T14:03:25Z

I started implementing this now. The idea is to have a flexible and powerful method of
ingesting tabular stand-off annotation data in STAM and, if needed,
either automatically align this with a text file (i.e. compute offsets if not
explicitly provided), or even reconstruct the text file from zero.

Users should be able to provide simple TSV data like:

Text	pos
Hello	interjection
world	noun

Here Text is a recognized column and pos is not so it translates to an
AnnotationSet (undefined here) and DataKey (pos). When loaded against an existing
resource file (like below), the offsets are computed automatically

Hello world!

Alternatively, this text (without the exclamation mark) can be reconstructed on
the basis of the input data (with space as an output delimiter). Note that text
input doesn't need to be constrained to words/tokens. Reconstruction and
alignment both assumes the input rows are sequential. If rows are explicitly
marked as not sequential (via some parameter), we can fall back on a tagging
mechanism to simple tag all found matches (e.g. with natural word boundaries).

The above illustrates the more complex case I want to support where input data
is incomplete, when more predefined columns are used parsing can be much
simpler and no alignment or reconstruction is needed in the first place:

Text	BeginOffset	EndOffset	pos
Hello	0	5	interjection
world	6	10	noun

This includes text validation and support for custom columns. Other parse modes are to be implemented still.

proycon · 2023-06-05T18:03:11Z

This is mostly implemented now.

proycon added the enhancement New feature or request label Mar 25, 2023

proycon self-assigned this Mar 25, 2023

proycon added this to STAM: Stand-off Text Annotation Model May 3, 2023

proycon moved this to Todo in STAM: Stand-off Text Annotation Model May 3, 2023

proycon moved this from Todo to In Progress in STAM: Stand-off Text Annotation Model May 25, 2023

proycon added a commit that referenced this issue May 25, 2023

wip: starting from-tsv implementation #1

198a27d

proycon added a commit that referenced this issue May 29, 2023

wip: working on tsv import #1

49cce13

proycon added a commit that referenced this issue May 30, 2023

Implemented TSV import (simple mode only) #1

607e382

This includes text validation and support for custom columns. Other parse modes are to be implemented still.

proycon added a commit that referenced this issue May 31, 2023

implemented automatic text alignment for TSV import (untested still) #1

617921e

proycon added a commit that referenced this issue Jun 1, 2023

import: implemented ReconstructText parse mode #1

e7b8222

proycon added a commit that referenced this issue Jun 1, 2023

doc: documented stam import and stam export #1

70c8d44

proycon added the ready This has been implemented but not released yet label Jun 5, 2023

proycon mentioned this issue Jun 5, 2023

Import from CONLL-U (Plus) #3

Open

proycon closed this as completed Jun 7, 2023

github-project-automation bot moved this from In Progress to Done in STAM: Stand-off Text Annotation Model Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TSV to STAM conversion #1

TSV to STAM conversion #1

proycon commented Mar 25, 2023

proycon commented Mar 26, 2023

proycon commented May 25, 2023

proycon commented Jun 5, 2023

TSV to STAM conversion #1

TSV to STAM conversion #1

Comments

proycon commented Mar 25, 2023

proycon commented Mar 26, 2023

proycon commented May 25, 2023

proycon commented Jun 5, 2023