diff --git a/README.md b/README.md index 36993a0..ac00a3e 100644 --- a/README.md +++ b/README.md @@ -31,16 +31,19 @@ This field is optional. If not provided, its expected that the split property wi #### bbox and geometry -* `ml-aoi` Multiple items may reference the same label and image item by scoping the `bbox` and `geometry` fields. TODO: Better describe scoping of overlap between raster and label items? -* `ml-aoi` Items `bbox` field may overlap when they belong to different `ml-aoi:split` set. -* `ml-aoi` Items in the same Collection should never have overlapping `geometry` fields. +- `ml-aoi` Multiple items may reference the same label and image item by scoping the `bbox` and `geometry` fields. TODO: Better describe scoping + of overlap between raster and label items? +- `ml-aoi` Items `bbox` field may overlap when they belong to different `ml-aoi:split` set. +- `ml-aoi` Items in the same Collection should never have overlapping `geometry` fields. ## Links `ml-aoi` Item must link to both label and raster STAC items valid for its area of interest. These Link objects should set `rel` field to `derived_from` for both label and feature items. -`ml-aoi` Item should be contain enough metadata to make it consumable without the need for following the label and feature link item links. In reality this may not be practical because the use-case may not be fully known at the time the Item is generated. Therefore it is critical that source label and feature items are linked to provide the future consumer the option to collect additional metadata from them. +`ml-aoi` Item should be contain enough metadata to make it consumable without the need for following the label and feature link item links. In +reality this may not be practical because the use-case may not be fully known at the time the Item is generated. Therefore it is critical that +source label and feature items are linked to provide the future consumer the option to collect additional metadata from them. | Field Name | Type | Name | Description | | ------------- | ------ | ---- | --------------------------- | @@ -127,7 +130,7 @@ If the tests reveal formatting problems with the examples, you can fix them with npm run format-examples ``` -# Design Decisions +## Design Decisions Central choices and rational behind them is outlined in the ADR format: diff --git a/docs/0001-record-architecture-decisions.md b/docs/0001-record-architecture-decisions.md index a3e958d..4ac1d48 100644 --- a/docs/0001-record-architecture-decisions.md +++ b/docs/0001-record-architecture-decisions.md @@ -1,7 +1,5 @@ ---- -id: 0001-record-architecture-decisions -title: 1 - Recording Architecture Decisions ---- +# 1. Record architecture decisions + Date: 2020-08-08 ## Status diff --git a/docs/0002-use-case-definition.md b/docs/0002-use-case-definition.md index dee307c..35eba0d 100644 --- a/docs/0002-use-case-definition.md +++ b/docs/0002-use-case-definition.md @@ -1,7 +1,5 @@ ---- -id: 0002-use-case-definition -title: 2 - Use Case ---- +# 2. Us- case definition + Date: 2020-08-10 ## Status @@ -31,7 +29,8 @@ For instance it is possible to apply a single source of ground-truth building la `ml-aoi` Item links to both raster STAC item and label STAC item. In this relationship the source raster and label items are static and long lived, being used by several `ml-aoi` catalogs. By contrast `ml-aoi` catalog is somewhat ephemeral, it captures the training set in order to provide model reproducibility and provenance. -There can be any number of `ml-aoi` catalogs linking to the same raster and label items, while varying selection, training/testing/validation split and class configuration. +There can be any number of `ml-aoi` catalogs linking to the same raster and label items, while varying selection, training/testing/validation split +and class configuration. ## Decision @@ -40,4 +39,4 @@ We will adopt the use and development of `ml-aoi` extension in future machine-le ## Consequences We will not longer attempt to use `label` extension as a sole source of training data for ML models. -We will continue development of tools to both produce and consume `ml-aoi` extension catalogs. \ No newline at end of file +We will continue development of tools to both produce and consume `ml-aoi` extension catalogs. diff --git a/docs/0003-test-train-validation-split.md b/docs/0003-test-train-validation-split.md index ca5e1d9..96e8bd8 100644 --- a/docs/0003-test-train-validation-split.md +++ b/docs/0003-test-train-validation-split.md @@ -1,7 +1,5 @@ ---- -id: 0003-test-train-validation-split -title: 3 - Test Train Validation Split ---- +# 3. Test-train-validation split + Date: 2020-08-10 ## Status @@ -22,7 +20,7 @@ The which items are selected for this split will effect model performance and sh In context of a STAC catalog there are multiple ways to express the data split. This ADR explores available options and their consequences. -##### Split by Collection +### Split by Collection Split could be generated by generating a separate collection for each set. This is a flexible approach. However, the grouping of these collections into one cohesive training set would have to be done by convention, for instance by prefix on collection `id`. @@ -33,7 +31,7 @@ Additionally the convention of how to associate training with testing with valid Further it would be easy to include a single item in both training and testing set without realizing it. This is not a good choice for these reasons. -##### Split by Link property +### Split by Link property The top-most `ml-aoi` collection has to link to each item or child catalogs. These links could have additional property that designates the split. @@ -43,7 +41,7 @@ However, when ingested into STAC API this link property is often lost and is not Thus the split set membership would not be visible to through STAC API, which is bad. This is not a good choice for that reason. -##### Split by Item property +### Split by Item property Each item could have an extension specific property (ex: `ml-aoi:split`) that designates set membership. This approach addresses the short-comings of the previous methods. diff --git a/docs/0004-multiple-label-items.md b/docs/0004-multiple-label-items.md index b98bc2d..5cd3e97 100644 --- a/docs/0004-multiple-label-items.md +++ b/docs/0004-multiple-label-items.md @@ -1,7 +1,5 @@ ---- -id: 0003-multiple-label-items -title: 1 - Multiple Label Items ---- +# 4. Multiple label items + Date: 2020-08-11 ## Status @@ -15,7 +13,7 @@ This would be a useful feature for training multi-class classifiers. One can imagine having a label STAC item for buildings and separate STAC item for fields. STAC Items Links object is an array, so many label items could be linked to from a single `ml-aoi` STAC Item. -#### Limiting to single label link +### Limiting to single label link Limiting to single label link however is appealing because the label item metadata could be copied over to `ml-aoi` Item. This would remove the need to follow the link for the label item during processing. @@ -25,7 +23,7 @@ If multi-class label dataset would be required there would have to be a mechanic existing labels into a single STAC `label` item. This could mean either union of GeoJSON FeatureCollections per item or a configuration of a more complex STAC `label` Item that links to multiple label assets. -#### Allowing multiple labels +### Allowing multiple labels The main appeal of consuming multi-label `ml-aoi` items is that it would allow referencing multiple label sources, some which could be external, without the need for pre-processing and thus minimizing data duplication. @@ -51,4 +49,4 @@ The resulting label catalog can capture that design and iteration required for t `ml-aoi` Items can copy all `label` extension properties from the `label` Item. In effect `ml-aoi` Items extends `label` item by adding links to feature imagery. -This formulation lines up with original problem statement for `ml-aoi` extension. \ No newline at end of file +This formulation lines up with original problem statement for `ml-aoi` extension.