TrainingDML-AI Extension Specification

Title: TrainingDML-AI
Identifier: https://stac-extensions.github.io/trainingdml-ai/v1.0.0/schema.json
Field Name Prefix: tdml
Scope: Item, Collection
Extension Maturity Classification: Proposal
Owner: @TrainingDML

This document explains the fields of The Training Data Markup Language for Artificial Intelligence (TrainingDML-AI) Extension to the SpatioTemporal Asset Catalog (STAC) specification. Training data plays a fundamental role in Earth Observation (EO) Artificial Intelligence Machine Learning (AI/ML), especially Deep Learning (DL). The TrainingDML-AI Extension provides detailed metadata for formalizing the information model of geospatial machine learning training data. This includes but is not limited to the following aspects:

· How the training data is prepared, such as provenance or quality;

· How to specify different metadata used for different ML tasks such as scene/object/pixel levels;

· How to differentiate the high-level training data information model and extended information models specific to various ML applications;

· How to introduce external classification schemes and flexible means for representing ground truth labeling.

Examples:
- Dota-v1.5 Dataset:
  - Item 1 example-Dota-v1.5 Dataset: Shows the basic usage of the extension in a STAC Item
  - Item 2 example-Dota-v1.5 Dataset: Shows the basic usage of the extension in a STAC Item
  - Collection example-Dota-v1.5 Dataset: Shows the basic usage of the extension in a STAC Collection
- WHU building Dataset:
  - Item example-WHU building Dataset: Shows the basic usage of the extension in a STAC Item
  - Collection example-WHU building Dataset: Shows the basic usage of the extension in a STAC Collection
JSON Schema
Changelog

Collection Fields

The fields in the table below can be used in these parts of STAC documents:

Catalogs
Collections
Item Properties (incl. Summaries in Collections)
Assets (for both Collections and Items, incl. Item Asset Definitions in Collections)
Links

Field Name	Type	Description
tdml:amount_of_training_data	number	Required, Total number of training samples in the AI training dataset.
tdml:classification_schema	string	Classification schema for classes used in the AI training dataset.
tdml:metrics_in_LIT	[MetricsInLIT Object]	Results of performance metrics achieved by AI/ML algorithms in the peer-reviewed literature.
tdml:image_sizes	[number]	Size of the images used in the EO training dataset.
tdml:scope	Scope Object	Description of the scope of the training dataset.
tdml:quality	Quality Object	Quality description of training datasets.
tdml:provenance	provenance Object	Provenance information of the training data and training dataset.
tdml:data_sources	[string]	Citation of data sources.

In addition, fields from the following extensions must be imported in the item:

the Label Extension Specification to describe properties of a training dataset.
the Scientific Citation Extension to describe DOI of a training dataset.
the Electro-Optical Extension to describe bands of a training dataset.

Item Fields

The fields in the table below can be used in these parts of STAC documents:

Catalogs
Collections
Item Properties (incl. Summaries in Collections)
Assets (for both Collections and Items, incl. Item Asset Definitions in Collections)
Links

Field Name	Type	Description
tdml:quality	Quality Object	Quality description of training datasets.
tdml:provenance	Provenance Object	Provenance information of the training data and training dataset.
tdml:data_sources	[string]	Citation of data sources.

In addition, fields from the following extensions must be imported in the item:

the Label Extension Specification to describe label properties of a training instance.
the ML AOI Extension Specification to describe training type of a training instance.

Additional Field Information

tdml:amount_of_training_data

Total number of training samples in the AI training dataset.

tdml:classification_schema

Time when the AI training dataset was created.

tdml:metrics_in_LIT

Results of performance metrics achieved by AI/ML algorithms in the peer-reviewed literature.

tdml:image_sizes

Size of the images used in the EO training dataset. The imageSize is recommended to be expressed in the form of "width*height". If the imageSize of the training data in dataset is not the same, you can use the imageSize of Smallest size image and the imageSize of largest image to express, such as "minWidth*minHeight~maxWidth*maxHeight".

tdml:scope

Description of the scope of the training dataset.

tdml:quality

Quality description of training datasets. Quality will be aligned with the DQ_DataQuality class in the ISO 19157:2013 spatial data quality model, and the quality assessment metrics for the sample dataset are described using the quality metric classes defined in ISO 19157:2013.

tdml:provenance

provenance includes the labeler and the labeling procedure, which can be mapped to the agent and activity respectively in W3C PROV model. The labeler identifies the agent that creates the training dataset or individual samples, and the labeling procedure represents the process for data generation.

tdml:data_sources

Citation of data sources.

MetricsInLIT Object

This is the introduction for the purpose and the content of the metricsInLIT Object used in field: tdml:metricsInLIT.

Field Name	Type	Description
doi	string	REQUIRED. Digital object identifier of the peer-reviewed literature.
algorithm	string	AI/ML algorithms used in the peer-reviewed literature.
metrics	object	REQUIRED. Metrics and results of AI/ML algorithms in the peer-reviewed literature.

An example of yolov5's MetricsInLIT on the DOTA-v1.5 dataset:

{
    "doi":"10.5281/zenodo.3983579",
    "algotithm": "YOLOV5",
    "metrics":[
        {
            "name": "AP50",
            "value": "66.1"
        },
        {
            "name": "AP50:95",
            "value": "41.5"
        },
        {
            "name": "AR1",
            "value": "39.4"
        },
        {
            "name": "AR10",
            "value": "54.9"
        },
        {
            "name": "AR100",
            "value": "58.4"
        }
	]
}

Quality Object

This is the introduction for the purpose and the content of the Quality Object used in filed: tdml:quality.

Field Name	Type	Description
scope	[Scope Object]	REQUIRED. the scope of quality information is specified.
report	[QualityElement Object]	Quality reports about the training dataset.

Provenance Object

This is the introduction for the purpose and the content of the Provenance Object used in filed: tdml:provenance.

Field Name	Type	Description
scope	[Scope Object]	REQUIRED. the scope of labeling information is specified.
labeling_methods	[string]	Methods used in the labeling procedure.
labeling_tools	[string]	Tools or software used in the labeling procedure.
labeler_names	[string]	Name of the labeler.

Scope Object

Field Name	Type	Description
level	string	REQUIRED. The applicable level of data.
level_description	object	REQUIRED. A more detailed description of the level to better understand the scope of application of the data.

QualityElement Object

This is the introduction for the purpose and the content of the qualityElement. Elements related to quality, or more specifically, bias that can be used to reduce the errors when using AI/ML. For example, any knowledge of the TD imbalance and mislabeling can be stored in TD quality.

Field Name	Type	Description
type	string	REQUIRED. Type of evaluation quality.
measure	string	REQUIRED. Reference to measure used.
evaluation_method	string	REQUIRED. Evaluation information.
result	string	Value obtained from applying a data quality measure..

Relation types

It is highly recommended to use the version-history as a rel type in the Link Object to record the changed training samples between two versions at the collection level as a changeset. The changeset is used to track updates made to a specific version of a sample dataset, identified by its "datasetId" and "version".

There are three types of updates for sample data units: "add" for adding new sample data units, "modify" for modifying existing sample data units, and "delete" for removing sample data units. "Modify" includes changes to metadata of sample data, changes to original data used in sample data, and additions, modifications, and deletions of all labeled objects in the sample data.

Best Practices

Core and other extensions fields

It is higly recommended to use the following fields to describe the training dataset:

Field name	TrainingDML-AI usage
providers	People or organizations who provide the AI training dataset.
label: overviews	Statistics results of training samples in each class.
label: classes	REQUIRED. Classes used in the AI training dataset.
label: tasks	REQUIRED. Type description of the EO task.
label: methods	Methods used in the labeling procedure.
ml-aoi: split	Training type of the individual AI.
eo: bands	Description of the image bands used in the EO training dataset.
sci:doi	Digital object identifier of the AI training dataset.

Contributing

All contributions are subject to the STAC Specification Code of Conduct. For contributions, please follow the STAC specification contributing guide Instructions for running tests are copied here for convenience.

Running tests

The same checks that run as checks on PR's are part of the repository and can be run locally to verify that changes are valid. To run tests locally, you'll need npm, which is a standard part of any node.js installation.

First you'll need to install everything with npm once. Just navigate to the root of this repository and on your command line run:

npm install

Then to check markdown formatting and test the examples against the JSON schema, you can run:

npm test

This will spit out the same texts that you see online, and you can then go and fix your markdown or examples.

If the tests reveal formatting problems with the examples, you can fix them with:

npm run format-examples

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
examples		examples
json-schema		json-schema
CHANGELOG.md		CHANGELOG.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrainingDML-AI Extension Specification

Collection Fields

Item Fields

Additional Field Information

tdml:amount_of_training_data

tdml:classification_schema

tdml:metrics_in_LIT

tdml:image_sizes

tdml:scope

tdml:quality

tdml:provenance

tdml:data_sources

MetricsInLIT Object

Quality Object

Provenance Object

Scope Object

QualityElement Object

Relation types

Best Practices

Core and other extensions fields

Contributing

Running tests

About

Releases 1

Packages

openrsgis/trainingdml-ai-extension

Folders and files

Latest commit

History

Repository files navigation

TrainingDML-AI Extension Specification

Collection Fields

Item Fields

Additional Field Information

tdml:amount_of_training_data

tdml:classification_schema

tdml:metrics_in_LIT

tdml:image_sizes

tdml:scope

tdml:quality

tdml:provenance

tdml:data_sources

MetricsInLIT Object

Quality Object

Provenance Object

Scope Object

QualityElement Object

Relation types

Best Practices

Core and other extensions fields

Contributing

Running tests

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Packages