Let's start thinking about how to document models #16

alix-tz · 2022-12-02T16:32:02Z

an example provided by @tboenig : https://tboenig.github.io/gt-metadata/document-your-gt.html (it ties the description of the model to the description of the dataset)
a proposition from @PonteIneptique :

On the top of my head, properties should include (* : required):

Title*

Description*

Software (Name, Link, Version)*

DOI Link*

Project

Authors

Used datasets

Manuscript / Print / Both (Simpler than what we have for dataset)*

Languages*

Scripts*

Known characters

License*

Encoding*

alix-tz · 2022-12-02T16:45:42Z

I think the software should be one of the first thing to appear, because if I'm using Transkribus, I won't care that model X or Y are able to handle French if they are Kraken models.

Now that raises an important question: given that Transkribus already provides a page listing public transcription models (https://readcoop.eu/transkribus/public-models/), do we want to also cover Transkribus models?

Personnally, I would lean in favor of it¹, but it makes things a little more complicated: for example License, Ecoding and DOI² might be impossible to fill for Transkribus models.

Because 1) it might attract Transkribus users who didn't think of sharing their data/ground truth, 2) users might chose a software depending of the availability of models, 3) we can do better than the current metadata used by Transkribus. ↩
No DOI in Transkribus but models do have a unique ID. ↩

mittagessen · 2022-12-02T17:00:05Z

Sorry for only starting to participate now. Something that is rather important is a field that indicates the type of model, e.g. transcription, segmentation, reading order, ... in addition to the software so it is possible to filter according to what one is actually looking for without having to download individual models. That would probably require changing the semantics of the known characters field to something like possible outputs.

As @PonteIneptique correctly identified models are somewhat ephemeral. In my opinion we should at least provide guidelines on how to deal with that. One (not particularly well thought out) way could be to treat the record/DOI as a 'prototype' model for that dataset(s) for a particular software and publish replacement models, e.g. a tweaked architecture improving performance, as a version linked to that original model instead of creating a completely new record. This is primarily to reduce the noise level in any model repository but might have some other benefits as well such as incentivizing early publication of models.

alix-tz · 2022-12-02T17:06:34Z

Ah your comment reminds me that we should probably include a "date of creation" property!

tboenig · 2023-03-06T09:35:38Z

Hello to All,

unfortunately I could not participate in the discussion. I would now like to continue the discussion.
If I understood everything correctly, there should be

a schema for GT metadata and
a schema for models should be available.

Both schemas are strongly related to each other in terms of content but have special features.

It can be stated, the schema for GT is currently stable.
The schema for a model is under development.

My proposal for the description of metadata for a model was always based on the GT.
Example scenario.
GT was created and described with metadata. A model is created with this GT and this model is recorded in the metadata record.

Now, of course, there are other scenarios:
I use

a very specific GT and create only one model
or
different GTs are combined by me, merged to one GT and a model is created.

In the first case there should be a connection between model and GT.
In the second case, I would think that it is actually new GT, which is

gets an independent metadata set + model metadata set.
but in the standalone metadata record it is noted that this record is based on GT.... was created.

I have expressed this now first everything naturally linguistically, since I assume that the formal writing can be realized so more simply then.

alix-tz added question Further information is requested schema labels Dec 2, 2022

PonteIneptique mentioned this issue Oct 26, 2023

Update schema.json #21

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let's start thinking about how to document models #16

Let's start thinking about how to document models #16

alix-tz commented Dec 2, 2022 •

edited

Loading

alix-tz commented Dec 2, 2022

mittagessen commented Dec 2, 2022

alix-tz commented Dec 2, 2022

tboenig commented Mar 6, 2023

Let's start thinking about how to document models #16

Let's start thinking about how to document models #16

Comments

alix-tz commented Dec 2, 2022 • edited Loading

alix-tz commented Dec 2, 2022

Footnotes

mittagessen commented Dec 2, 2022

alix-tz commented Dec 2, 2022

tboenig commented Mar 6, 2023

alix-tz commented Dec 2, 2022 •

edited

Loading