Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on whether assets should be a list or a dictionary. #198

Closed
cholmes opened this issue Aug 24, 2018 · 9 comments
Closed

Decide on whether assets should be a list or a dictionary. #198

cholmes opened this issue Aug 24, 2018 · 9 comments
Labels
prio: must-have required for release associated with
Milestone

Comments

@cholmes
Copy link
Contributor

cholmes commented Aug 24, 2018

Lots of discussion in #151. In #197 we decided to change links back. This issue is to decide for 0.6.0 if we also change assets back to links.

This should be the last time we futz with these things.

I believe the state of things is:

The argument for assets going to list is for consistency across assets and links (though everyone decided it was ok for those to be different), and that requiring keys to be created will lead to providers just autogenerating random keys, so they become less useful. Plus the order of a list may be useful (display the first thumbnail or COG). And that a dictionary is just a shortcut, and any program should be able to iterate for a 'key' (like 'id') in a list.

The argument for assets staying as dictionaries is that it's potentially useful to have unique keys. We can specify less in the assets types with a general definition at the dataset level that the key can be a lookup for. Plus it's what we're doing now, and there were good reasons. Happy for @matthewhanson to fill in more here, I feel I'm not doing the best job presenting the 'stay as dict' argument.

@cholmes cholmes added the prio: must-have required for release associated with label Aug 24, 2018
@cholmes cholmes added this to the 0.6.0 milestone Aug 24, 2018
@cholmes
Copy link
Contributor Author

cholmes commented Aug 24, 2018

cc @mojodna and @hgs-trutherford - I now remember Tim also felt pretty strongly we should have lists for assets.

@ghost
Copy link

ghost commented Aug 28, 2018

I'm on the array side of the argument for sure. I'm not sure I would say "strongly", but I don't think the names are going to be well known most of the time. We are just going to end up with a whole bunch of "B01", "b1", "band1", "1", etc names that I think are going to make it confusing. The only one that makes sense to me as a key is "thumbnail", but who's to say we wouldn't want more than one thumbnail? Thumbnails can be in multiple resolutions or band combinations. That leads me to think a "type" property is more useful than a key name.

If we go the dictionary route, I would still want to add a type to make the assets browse-able.

@cholmes
Copy link
Contributor Author

cholmes commented Aug 29, 2018

#175 brings mime_type in to asset, and I do think we should change it to just 'type' to align with links (I need to make an issue on it).

I do think if we go with dictionaries it's essential that we make a list of 'recommended keys', for things like thumbnail (to be the default thumbnail displayed) cogs, metadata, etc. I'm not sure exactly how to do them, do you do like metadata_xml? metadata_iso19115? And indeed probably even recommendations on how to form them. Then we can make a recommendation of how to do the band name options.

@m-mohr
Copy link
Collaborator

m-mohr commented Aug 31, 2018

We have so many issues and PRs on the topic arrays vs. objects, especially for assets, that arguments are separated over multiple issues and PRs (#112, #114, #127, #151, #197 and #142 for bands). I always like to have pros/cons lists, so I'll try to make a complete one here.

Arguments for arrays:

  1. No duplication of rel types in the keys, but that isn't always the case anyway (e.g. two assets with the same rel type) (update collection extension specification #112)
  2. No need to create keys, which may be auto-generated anyway. How would multiple thumbnails be named? (Update catalog.json to new 'links' style #127)
  3. Objects for mapping could be generated from arrays that have a "unique id" (Update catalog.json to new 'links' style #127)
  4. A non-semantic element is introduced (the key); implementations may apply meaning to this in ways that potentially conflict with "better" resolution rules. (Update catalog.json to new 'links' style #127)
  5. JSON objects have no inherent ordering, assets have => resulting arrays won't necessarily be consistent across viewers and one could get the first cog or thumbnail (Update catalog.json to new 'links' style #127)

Arguments for objects (dictionaries):

  1. Easier to merge (i.e. merge collection data into item data) (Update catalog.json to new 'links' style #127)
  2. Merging has a predictable behavior (problem: duplicated ids hold by arrays) (Update catalog.json to new 'links' style #127)

Somewhat "neutral":

  1. Both would probably need some kind of list where common types are listed, e.g. cog, thumbnail, metadata_xyz, ...
  2. Retrieving elements would be easier or more complex depending on the use case: getting the first element would be easier with arrays, getting a named element (by key) would be easier with objects.
  3. There was an additional argument in Update catalog.json to new 'links' style #127 regarding modelling STAC as Postgres DDL, but I didn't quite got that so I list it here for now.
  4. Consistency with links was an argument for arrays in Decide on whether assets should be a list or a dictionary. #198, though everyone decided it was ok for those to be different (=> neutral)

Did I pick up all? I feel I missed something for the objects? I tried to be as unbiased as possible - at least I would have voted neutral on arrays vs. objects when writing this list.

Some comments from my side:

  • Array 1: I think that's only a minor issue.
  • Array 2/Array 4: Array 4 would be obsolete with a list of common "types" (see also neutral 1) - not to be confused with mime types. Array 2 would be obsolete if we also add guidelines for naming when multiple elements occur. Both make the spec lengthy though.
  • Array 3: Valid reason, but I am not sure whether we need to generate a unique identifier for assets, that may just be the absolute URL.
  • Neutral 2/Array 5: I was not really sure whether "neutral 2" is an argument for objects or neutral. Reason: I think it's not really a good use case to rely on the order of assets anyway as it is unpredictable how providers order them. That could also void argument "array 5".
  • Object 1: Matthew mentioned that objects are most important when used with the collection extension. If we drop it (Rename or Remove the Collection Extension to be less confusing #174) it could get obsolete to merge.
  • Object 2: The validator should ensure unique keys in arrays. I think we have similar rules in other parts of the spec, e.g. a unique name across datasets / items, which are not really enforced yet.

So, either way we should create a list of asset groups. I think they may just be pretty generic based on actual use cases. For example: thumbnail, metadata, data (some name to group the actual granules/files/raw data), other (what else could be referenced as asset?). More specific differentiation could be done based on the mime type, e.g. metadata_xml as proposed by Chris could just be resolved by group==metadata && type==text/xml, cogs could be retrieved by group==data && type=image/cog+geotiff etc. Sure there are things we can't always resolve (ISO 19115 metadata is also xml!?) [By the way, should metadata be in assets or links?]

Now that I have this list and go through the arguments, I tend slightly to vote for arrays due to the reason that it seems unreasonable to create additional keys. But this whole thing also depends a little on the outcomes of #174 as some reasons for objects are based on collections (the extension) being in the spec.

Edit:
Regarding the MIME types for metadata (probably is better suited in another issue, but not sure where to put it). The CEOS OpenSearch Best Practice Document, pages 38-39 lists some vendor MIME types, which we could also adapt:

  • ISO 19139 application/vnd.iso.19139+xml
  • ISO 19139-2 application/vnd.iso.19139-2+xml
  • ISO 19115-3 application/vnd.iso.19115-3+xml
  • OGC 10-157r3 application/gml+xml;profile=http://www.opengis.net/spec/EOMPOM/1.0
  • OGC 10-157r4 application/gml+xml;profile=http://www.opengis.net/spec/EOMPOM/1.1
  • Dublin core7 application/xml
  • Markdown text/x-markdown

@m-mohr m-mohr removed the prio: must-have required for release associated with label Oct 8, 2018
@m-mohr m-mohr removed this from the 0.6.0-RC1 milestone Oct 8, 2018
@matthewhanson
Copy link
Collaborator

We ended up keeping arrays as a dictionary, and there hasn't been discussion since so I'm going to close this. Feel free to reopen or create a new ticket if it needs to be discussed again.

@m-mohr
Copy link
Collaborator

m-mohr commented Mar 28, 2019

Well, we said some months ago that we shouldn't change this until we are discussing the asset schemas as we don't want to change again when we discuss the schemas. So I'm not sure whether your conclusion @matthewhanson from the inactivity are correct.

@tomkralidis
Copy link
Contributor

What about casting assets as a list of Link objects, whose properties would be able to cover what we need. In addition, echo'ing @hgs-truthe01 in that we would end up with many meaningless names being used to simply satisfy the content model.

@m-mohr m-mohr reopened this Sep 19, 2019
@m-mohr
Copy link
Collaborator

m-mohr commented Oct 15, 2019

Now that I'm starting to generate STAC Items from job results the dictionary is getting painful as I need to create random property keys. At the moment I'm just basically adding 1,2,3,4,etc. as strings, but that just doesn't feel right. As we generate random names anyway and don't really have a convention regarding naming, it feels like the property keys don't really add a value. Therefore I'd vote to change to array and add a rel type instead. I think this is a think we could finally decide on during the next sprint. Still, for arrays it will be harder to have asset schemas and we would need to find a solution for that.

@m-mohr m-mohr added this to the 0.9.0 milestone Oct 15, 2019
@m-mohr m-mohr added the assets label Oct 28, 2019
@m-mohr
Copy link
Collaborator

m-mohr commented Nov 8, 2019

Was discussed during the sprint: We'll stick with objects, but use the key names more as id. For communication of the purpose of an asset, the a role property will be added (see #651 )

ToDo for this issue: Explain why the object key is useful (asset schemas) and how it should be used (identifier-like, not necessarily communicate the purpose and use role instead).

@m-mohr m-mohr mentioned this issue Nov 8, 2019
3 tasks
@m-mohr m-mohr added the prio: must-have required for release associated with label Nov 13, 2019
@m-mohr m-mohr added PR ✓ and removed sprint #5 labels Nov 25, 2019
@m-mohr m-mohr closed this as completed Nov 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio: must-have required for release associated with
Projects
None yet
Development

No branches or pull requests

4 participants